The goal of this paper is to analyze an intriguing phenomenon recentlydiscovered in deep networks, namely their instability to adversarialperturbations (Szegedy et. al., 2014). We provide a theoretical framework foranalyzing the robustness of classifiers to adversarial perturbations, and showfundamental upper bounds on the robustness of classifiers. Specifically, weestablish a general upper bound on the robustness of classifiers to adversarialperturbations, and then illustrate the obtained upper bound on the families oflinear and quadratic classifiers. In both cases, our upper bound depends on adistinguishability measure that captures the notion of difficulty of theclassification task. Our results for both classes imply that in tasks involvingsmall distinguishability, no classifier in the considered set will be robust toadversarial perturbations, even if a good accuracy is achieved. Our theoreticalframework moreover suggests that the phenomenon of adversarial instability isdue to the low flexibility of classifiers, compared to the difficulty of theclassification task (captured by the distinguishability). Moreover, we show theexistence of a clear distinction between the robustness of a classifier torandom noise and its robustness to adversarial perturbations. Specifically, theformer is shown to be larger than the latter by a factor that is proportionalto \sqrt{d} (with d being the signal dimension) for linear classifiers. Thisresult gives a theoretical explanation for the discrepancy between the tworobustness properties in high dimensional problems, which was empiricallyobserved in the context of neural networks. To the best of our knowledge, ourresults provide the first theoretical work that addresses the phenomenon ofadversarial instability recently observed for deep networks. Our analysis iscomplemented by experimental results on controlled and real-world data.
展开▼